-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LowPtElectrons: updated BDT model and code base for ID #31080
Conversation
… CMSBParking with cms-merge-topic
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31080/17619
|
A new Pull Request was created by @bainbrid for master. It involves the following packages: RecoEgamma/EgammaElectronProducers @perrotta, @jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
test parameters: |
@cmsbuild please test |
The tests are being triggered in jenkins.
|
+1 |
@slava77 @afiqaize @SohamBhattacharya
Ah, true, my mistake - it is used using a standalone test we have privately, which fooled me.
I'm not sure how to do this. If we add a userFloat as you suggest, then everything else in this PR can remain the same?
Yes, this is true, but this is part of the RECO sequence, while we wish to execute the ID as part of the PAT sequence. So the reRECO test you refer to can bre used to validate the performance within the RECO sequence (once we replace the placeholder model), but - most importantly - we wish to test the ID as part of the PAT sequence using MINIAOD inputs (i.e. same as for the training). Am I correct? |
@bainbrid if you plan to run it in PAT but don't have the final model yet, I think it would be useful to already introduce the relevant python conf as suggested above to the workflows, so everything can be tested end to end. Perhaps we can get a feeling for the timing as well, even though the model is a placeholder. |
Hi @jpata. I’ve linked to an example PAT config in a post above, found [here](https://github.com/cms-sw/cmssw/compare/14635b1..355e62b). This is my “best guess” for an implementation. The issue is that no module then calls the ID producer (my mistake) so it’s never executed. @slava77 suggests adding to userFloats but I don’t have any experience of this. If you can point me to an example, I can attempt to add it.
… On 14 Aug 2020, at 09:37, Joosep Pata ***@***.***> wrote:
@bainbrid if you plan to run it in PAT but don't have the final model yet, I think it would be useful to already introduce the relevant python conf as suggested above to the workflows, so everything can be tested end to end.
Perhaps we can get a feeling for the timing as well, even though the model is a placeholder.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
How about consuming it here: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/PatAlgos/python/slimming/slimmedLowPtElectrons_cfi.py |
The configuration logic for electron modifiers starts around
This would include both VID and more direct insertions in RecoEgamma/EgammaTools/python/egammaObjectModificationsInMiniAOD_cff.py egamma experts may be more specific with a pointer. @jainshilpi @lsoffi @afiqaize @SohamBhattacharya |
It looks like my pointers are more specific to the standard slimmedElectrons |
@jainshilpi @lsoffi @afiqaize @SohamBhattacharya Is somebody available to take a look at this? I suspect it's straightforward for an expert to figure out how to implement this, but would take me some time to understand the MINIAOD tools. (I'm happy to implement if somebody can give pointers.) In short, our IDProducer (if called by something!) takes the slimmedLowPtElectrons collection as input and produces a ValueMap of floats, keyed off the slimmed collection. See my attempt (so far) here. |
@bainbrid the main consideration right now is that the new code is included in the jenkins tests. As has been pointed out by Slava, looks like the [1] #31080 (comment) |
@jpata @slava77 In Slava's comment, I followed the links to this comparison. It seems a little different to the one you point me too - perhaps just higher stats? (I will comment on this distribution.) I can say the following: the new distribution is shifted to lower values, which would be bad for signal and good for background. It's not clear to me if we are looking at signal or bkgd or both (presumably a mix, most likely mainly bkgd), as it depends on the MC sample used for the test. However, the new code we are testing here is likely to be incompatible with the model weights stored by default in cms-data, so a shift to lower values is probably my expectation. If you also included cms-data/RecoEgamma-ElectronIdentification#14 in this test (please can you confirm?), then the weights would be compatible with the new code. For this scenario, I cannot say with certainty what I would expect to see, on balance probably not a shift to lower values. In summary: a change is highly likely, a shift to lower values is quite possible, and the values don't seem crazy. |
Thanks @bainbrid. The test was indeed done with cms-data/RecoEgamma-ElectronIdentification#14 as can be checked in #31080 (comment). The comparison Slava posted earlier in #31080 (comment) is with My understanding is then:
Did I get it right? |
@bainbrid apologies for missing this. We need to check with our ID and reco experts. will get back to you offline on this. |
Yes.
Yes, changes are expected.
Yes.
Yes, thanks! |
Thanks! Can I ask the EGamma contacts who were already pinged @afiqaize @SohamBhattacharya to confirm that this change to the BDT output is fine in the mean time? |
Hi, sorry for missing this.
I'm checking and will get back asap.
…On Tue, 18 Aug, 2020, 3:50 PM Joosep Pata, ***@***.***> wrote:
Thanks! Can I ask the EGamma contacts who were already pinged @afiqaize
<https://github.com/afiqaize> @SohamBhattacharya
<https://github.com/SohamBhattacharya> to confirm that this change to the
BDT output is fine in the mean time?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31080 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADUSNWSAOJ6TMJWQ6WBORWLSBJIWTANCNFSM4PW4LEPA>
.
|
@jpata - I'm afraid I'm probably best placed to comment on the BDT output ... but I'm afraid I cannot give a definitive answer. The reason in short is that the reference histogram is using a rather suboptimal model and so is a poor choice for reference (but the only one we have). Naviely, I'd expect to do better than this. However, the model we are testing is only a placeholder and so - without quite some digging - I cannot be sure if it should give better performance than the (suboptimal) default or not. Further, the distributions are an (unknown) mixture of signal and background, so this complicates the interpretation. Finally, I'm happy to move forward. The new distribution appears reasonably healthy in that it is smooth and there are no unusual spikes (e.g. at 0 or 1). So green light from me. We'll know more when we have the PAT sequence and the new model. |
Thanks @bainbrid. As you confirm it for EGamma, we can approve this PR from reco. I just want to make sure that nothing else depends on the BDT output in the mean time. |
+1
|
Afiq and I confirm that [1] is fine as simply modifying the standard sequence to produce a new collection would be the easiest way to do this. Again, sorry for the delay. |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
This PR concerns changes to the BDT model used for the identification of LowPtElectrons.
This PR requires the following external: cms-data/RecoEgamma-ElectronIdentification#14
The
LowPtGsfElectronIDProducer.cc
class has been adapted to handle changes to the list of features used by the new model in LowPtElectrons: updated BDT model for ID cms-data/RecoEgamma-ElectronIdentification#14We have reverse engineered some recent changes made to this package in the CMSSW_11_0_X cycle. The changes involve moving functionality back to the
interface
andsrc
directories. Utility methods defined inLowPtGsfElectronFeatures.h
guarantee the correct extraction of features (from both AOD and MINIAOD data tiers) for both evaluation (in CMSSW and outside) and training purposes (using workflows defined outside the cms-sw repo).These changes require the following external: LowPtElectrons: updated BDT model for ID cms-data/RecoEgamma-ElectronIdentification#14, which contains a "placeholder" model for the time being. The model weights will be updated again in a future PR once MC samples are available (in progress) for a final retraining.
Validation has been performed with the "placeholder" model and the performance is consistent with our expectations.